home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Collection of Tools & Utilities
/
Collection of Tools and Utilities.iso
/
edit
/
gal210a3.zip
/
LANCEANA.DOC
< prev
next >
Wrap
Text File
|
1984-11-04
|
16KB
|
440 lines
Lancelot documentation for programming the analyzer
---------------------------------------------------
Lancelot analyzes languge by interpreting commands given in an
analysis file. The commands are called actions which have a particular form.
Each action starts with a word, called a dot-dot command e.g., ..WORD,
followed by a print message and then followed by one or more elements as
defined below.
Several sample (and useful) analysis files are included with Lancelot,
and typical usage should be inferred from reading them. They do not all use
all of the options of Lancelot, and so are not a substitute for reading this
documentation.
The commands are given below in alphabetical order. Occasionally
there are common points between the commands, and they are discussed in a note
to the reader. The syntactic form of a command is followed by the semantics
of the command.
Definitions
Print messages
Print messages contain an arbitrary number of text lines which may or
may not be printed depending upon where it appears in an action. A print
message begins on a line that starts with a colon, and continues to the next
line that starts with a dot or colon. Print messages may contain variables
whose value will be substituted when the message is printed.
Occasionally you may wish to print nothing in a print message. You
may accomplish that using the following print message which is called the
empty print message:
:.
Blank lines before a print message are ignored, while blank lines
after a print message are included in the print message. This happens
because print messages do not stop until the next dot or dot-dot command.
Variables
Variables are words that start with a sharp sign (#). Lancelot
defines a number of variables that are set by various dot-dot commands. The
value that the variable takes on is described below with the dot-dot command
that sets the variable. You may define your own variables using the ..CALC
command. These variables may be used in the same manner as Lancelot's
variables. Lancelot's variables are all uppercase to make them stand out from
the rest of the print message. We suggest that your variables be given in
uppercase also. Case is significant in variable names.
Numbers
Some of the dot-dot commands have numbers as arguments. When the syntax
of the dot-dot command is given, the numbers are given names and enclosed in
angle brackets (<>). For example, <lower_bound> means a number whose value
is used as a lower bound.
Literal words and search words.
Lancelot has two concepts of a word, the literal word and the search
word. The literal word is a word that must occur literally in the text, and
is used by the ..MOST, ..SIZE, ..SHOW, and ..WORD commands. Literal words are
not sensitive to case. Search words are used by phrase matching and
conditional matching dot-dot commands, and are more powerful (and slower) than
literal words.
If a letter in a search word is capitalized, that letter matches
either upper or lower case text letters. A text word is terminated by one of
the following characters: .?!,;)<space><tab><return><linefeed>. If a search
word contains a star (*), the star matches as many text characters as
possible. If a search word contains a percent (%), then that percent matches
one or more of the following characters: <space><tab><cr><lf>. Percent
characters are useful only in conditional matching. Examples:
Analysis Matching Text
-------- -------------
*ing ing
ring
cling
string
Crawl* crawl
Crawl {note that C matches both upper and lower case}
crawled
crawling
a*c ac
abc
abbc
abbbc
abbbbc
in%to "in to"
"in to"
"in
to"
"in
to"
Syntax of a Lancelot Action
----------------------------
1. All actions start with a dot-dot command which determines the action to
occur. For example: ..WORD means that single words are to be searched
in the text. ..ALLCOND means that all sentences of the text are to be
searched to determine if they contain a specified word or sequence of
words.
2. The second element of an action is an unconditional print message which is
always printed (don't forget that :. prints nothing.)
3. The third element of an action is one or more literal words or search
words which are to be acted upon. The particular action taken depends
upon the dot-dot command.
4. The fourth element is a conditional print message that is printed only if
the literal or search word(s) were found.
5. Optionally, most dot-dot commands allow any number of pairs of element 3
and element 4 to be repeated any number of times.
6. An action is terminated by another dot-dot command.
For example:
..allcond {Element 1}
:Your text is now being searched for three word constructions. {Element 2}
.Ha* .preference {Element 3}
:The text says "has(have)...a preference." Say "prefer." {Element 4}
.Spell* .out {Element 3}
:The text says "spell(s)...out." Say "explain(s)." {Element 4}
.Take* .consideration
:The text says "take(s)....Consideration." Say "consider(s)."
..end {Element 6}
Lancelot Dot-dot Commands
-------------------------------------------------------------
..allcond
:print message (unconditional)
.word1 .word2 .word3 ...
:print message (conditional upon finding a phrase containing the above words)
.optional next element 3
:optional next element 4
The text is searched for all sentences that contain search word .WORD1
followed by zero or more words followed by any one of .WORD2, .WORD3, or ....
If the element consists of just .WORD1, then those sentences that contain
.WORD1 will be accepted and the conditional print statement will be printed.
If the element contains .WORD1 and .WORD2 then a sentence is accepted only if
it contains both .WORD1 and .WORD2 in the respective order. Note that .WORD1
may be an entire text phrase by using the percent character.
When a sentence is found that is accepted and the conditional print
message is not empty, not only is the element 4 print message printed but
also a window of three text lines is displayed to the user. This window
surrounds the place in the text where the phrase was found.
This action is repeated until all the occurrences in all sentences have
been found. The next element 3 is then used for the search. Multiple
elements in a single ..ALLCOND are equivalent to the same elements in multiple
..ALLCOND's.
The variable #COUNT is set to the number of times the phrase is found,
and #LINE is set to the last line in which the phrase is found. Note that
#LINE is set before the print message is printed, so the print message can
print the line in which the phrase was found.
-------------------------------------------------------------
..allphrase
:print message (unconditional)
.word1 word2 word3 ... (NOTE: no dots - this is a phrase)
:print message (conditional upon finding the phrase.)
.optional next element 3
:optional next element 4
The text is searched for all sentences that contain the phrase WORD1
followed by WORD2, followed by WORD3 ... If a sentence contains the
phrase of element 3 then the sentence is accepted.
When a sentence is found that is accepted and the conditional print
message is not empty, not only is the element 4 print message printed but
a